RO-S-68-  1 


o 

Oi 

r*-i 

so 

Q 


A  VECTOR  SPACE  DERIVATION -USING 
DYADS-OF  WEIGHTED  LEAST  SQUARES 
FOR  CORRELATED  NOISE 

(Special  Report) ' 

by 

James  S.  Pappas 


JUNE  1968 


D  D  C 


U.  S.  ARMY  TEST  AND  EVALUATION  COMMAND 
ANALYSIS  AND  COMPUTATION  DIRECTORATE 
DEPUTY  FOR  NATIONAL  RANGE  OPERATIONS 
WHITE  SANDS  MISSILE  RANGE,  NEW  MEXICO 


DISTRIBUTION  OF  THIS  DOCUMENT  IS  UNLIMINTED 


Reproduced  by  the 

CLEARINGHOUSE 
for  Foderal  Scientific  &  Technical 
Information  Springfield  Va.  22151 


'If: 


A  VECTOR  SPACE  DERIVATION- USING 
DYADS-OF  WEIGHTED  LEAST  SQUARES 
FOR  CORRELATED  NOISE 


(Special  Report) 


by 

u'ames  S»  Pappas 


June  1968 


U ,  S  ARMY  TEST  AND  EVALUATION  COMMAND 

ANALYSIS  AND  COMPUTATION  DIRECTORATE 
DEPUTY  FOR  NATIONAL  RANGE  OPERATIONS 
WHITE  SANDS  MISSILE  RANGE ,  NEW  MEXICO 


DISTRIBUTION  OF  THIS  DOCUMENT  IS  UNLIMITED 


CONTENTS 


Page  No 

ABSTRACT - - -  iv 

INTRODUCTION  - -  v 

NOTATION -  vi 

SECTION  I.  PRELIMINARY  DISCUSSION  -  1 

SECTION  II.  ESTIMATION  OF  A  CONSTANT  SCALAR  PLUS  NOISE  -  11 

SECTION  III.  ESTIMATION  OF  A  CONSTANT  VECTOR  PLUS  NOISE  - -  2k 

SECTION  IV.  POLYNOMIAL  PARAMETER  ESTIMATION  -  30 

SECTION  V.  MULTI-VARIABLE  POLYNOMIAL  -  39 

APPENDIX  A.  MATRIX  TRACE  PROPERTIES  -  50 

APPENDIX  B.  GRADIENTS  OF  SCALARS  WITH  RESPECT  TO  MATRICES  -  54 

APPENDIX  C.  MINIMIZATION -  67 

REFERENCES -  69 


iii 


ABSTRACT 


Matrix-Analysis  and  recursive  matrix  computing  sub¬ 
routines  offer  hope  of  relieving  the  current  computer  data 
deluge.  Classical  vei^-t  J.  least  squares  for  multi-variable 
parameter  estimation  in  the  presence  of  correlated  noise  sure 
developed  in  a  geometrical  vector  space  setting.  Ranh-one 
matrices ,or  dyads,  are  ubcd  extensively^ especially  in  obtaining 
gradients  of  traces  of  variance  matrices. 


iv 


TEXT  NOT  REPRODUCIBLE 

INTRODUCTION 


This  report  develops  the  classical  weighted  least-squares  theory 
in  a  vector-space  setting.  Computer  programs  arid  subroutines  which 
operate  on  larger  packages  of  data  in  the  form  of  data-matrices  and 
large  arrays  of  system  variables  as  Euclidean  vectors  offer  great 
hope  of  relieving  the  current  data  deluge  plague. 

Our  current  computer  programming  procedures  are  based  on  arithmetic 
operations  on  algebraic  field  elements  such  as  addition,  multiplication, 
division,  and  integration  of  scalars.  The  state  space  formulation 
requires  arithmetic  units  which  operate  on  matrices  as  elements  of  an 
algebraic  ring,  vector  space,  etc. 

In  the  classical  weighted  least  squares  theory  one  analytically 
and  computer-wise  works  with  tedious  summation  after  summations  of 
scalar  variables.  In  the  modern  theory  one  analytically  and  computer- 
wise  works  with  vector  space  theory,  square  and  rectangular  data  matrices 
of  full  and  non-full  rank  and  their  inverses  mid  psuedo  inverses. 

•Computer  economy  in  data  storage  and  computing  time  are  sought  through 
the  applications  of  clever  recursive  matrix  numerical  analysis  algorithms. 

This  report  is  the  second  of  u  series  developing  the  modern  state 
vector  recursive  estimation  theory,  'i'ne  essential  areas  for  understanding 
the  theory  are: 

1.  Unweighted  Least  Squares  Parameter- Vector  Estimation  and 
the  Vari ance-of-the-Estimate  Matrix. 

2.  Discrete  Matrix  Recursive  Methods  Applied  to  (l)  for  Real 
Time  (on  line)  Computer  Processing. 

3.  Weighted  Least  Squares  Parameter  Estimation  and  Variance- 
of-the-Estimate  Matrix  for  Correlated  Noise. 

4.  Discrete  Matrix  Recursive  Methods  Applied  to  (3)  for 
Real  Time  Computer  Processing. 

5.  Recursive  Weighted  Least  Squares  State-Vector  Estimation 
'.Lieory  (Kalman  Theory), 

Item  (l)  and  (2)  are  completed  and  oublished  in  reference  (4). 

Item  (3)  is  the  contents  of  the  current  report.  Items  (4)  and  (5)  are 
near  completion. 


NOTATION 


SECTION  1.  PRELIMINARY  DISCUSSION 


and 


Consider  the  system  of  two  vector  equations 
x(k+l}>  =  *(k+l,  k)x{k)>  +  f(k)>  +  “(> 

fc(k^  =  H(k)x(l^  +  v(l^ 

^  raxp  /  / 


where : 


c(k)>  ,  x(k+lj^ 


are  p-dimensional  column  vectors 
describing  the  states  at  stage  k 
and  stage  k+1. 


♦(k+1,  k)  is  a  pxp  state  transition  matrix. 


is  a  p-dimensional  deterministic  forcing  vector 
for  which  we  can  write  a  vector  function. 


is  a  p-dimensional  uncertainty  or  noise  vector, 
it  is  the  composite  of  the  random  noises  and 
the  variables  we  fail  to  model. 

is  the  m-dimensional  observation  vector,  ra 
is  less  than  or  equal  to  p. 


is  the  known  matrix  describing  how  the  state 
vector  is  functionally  related  to  the 
observation  vector  (if  the  instruments  were 
noise  free). 

is  an  m-dimensional  additive  instrument  noise 
vector. 


The  special  case  of 


and 


♦(k+1,  k)  *  I 


and 

H(k)  ■  Hq  -  a  constant  matrix  yields 

(5) 

x(2^  ■  I  x(l^ 

x(^  •  I  x(J>  ■  xU^ 

(6) 

_  e 

x(^^  ■  x(l^  for  all  k. 

And 

z(k)Jj>  ■  Hq  x(3jJ>  +  v(k)£>- 

(7) 

Define  the  vector 

oiV  «  Hq  x{1)^ 

✓q  mp  y 

(3) 

and 

z(k^>-  +  v(k^. 

(9) 

Hie  block  digram  of  equation  (9  )  is 
constant 


Fig.l  Block  of  Vector  Summing  Junction 
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Tbe  block  diagram  of  Equation  (7  )  is 
Gain 


Fig.  2  Block  Diagram  of  Eq.  7  as  Device 
with  Matrix  Gain  plus  Additive  Noise 

The  graph  of  equation  (9  )  is  a  random  dispersion  about  a  constant 
vector  in  m-space  as  shown  in  Figure  3. 


Fig.  3  Graph  of  Eq.  9  k-Noisy  Vectors  in 
M-Space  About  a  Point 

The  graph  of  equation  (7)  is  shown  as  a  transformation  on  a  constant 
vector  x(l^  in  p-space  to  a  sub-space  of  dimension  at  plus  an  additive 
a-diaensional  noise  vector  in  Figure 
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Fig.  4  Graph  of  Equation  7 

Hois*  Free  Conditions 

The  noise-free  condition  for  the  multivariable  case  is  discussed 
merely  to  motivate  some  algebraic  concepts  to  compare  with  the  almost 
trivial  scalar  case. 


When  the  noise  is  zero  in  Fig.  3  and  Eq.  9  we  have 
hence  one  measurement  of  z(^^  is  adequate  to  find^cp  . 


two  interpretations  o^  interest  can  occur. 
Interpretation  I 


(10) 


(11) 


m 


Input  Measurement. 

The  m-dimensional  vector  equation  ( 11) ,  when  is  a  known 
vector  and  Kq  is  a  known  mxp  "gain"  matrix,  //0 

racp  n. 

presents  the  problem  to  solve  for  the  p-dimensional  input  vector  x(l)(p> 
where  p  >  m.  / 
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When  p  =  1,  that  is  the  scalar  case 


a0  =  aO  x(l)  (12) 

hence 

nO  °0  =  x^*  (13) 


The  scalar  hQ  has  an  inverse,  however  the  mxp  matrix  Hq  does  not 

have  a  conventional  inverse  except  when  p  =  m  and  H  is  full  rank:  when 
p  is  greater  than  m  the  psuedo-in verse  is  a  valuable  tool  to  obtain 
part  of  the  solution. 

Interpretation  II. 


Instrument  Gain  Calibration . 


The  second  case  of  interest  for  the  noise  free  case  is  when 
is  known  and  we  know  the  inputs,  then  the  problem  is  to  solve 
for  the  gain  matrix.  We  have 


(11*.) 


In  equation  (ll )  we  have  1  vector  equation  (or  m  scalar  equations) 
with  mxp  unknowns.  If  we  use  p  different  known  inputs  then 


or  packaging  the  data  as 


an  mxp  matrix 


(16) 

(IT) 


Case  III.  Scalar  Polynomials. 

The  approximation  cf  a  function  with  a  polynomial  using  unweighted 
and  weighted  least  squares  considers 

zk  *  a0  +  al  xk  +  a2xk  +  aP-ixk  +  vk  (22) 

=  ^  +  •  •  • »  ixx”^  +  ek  (23) 

or  in  a  vector-space  setting 


Define  the  p-dimensional  parameter  row  vectors  as 
<^)8  =  (B1,  B2,  ....  Bp)  =  (aQ,  a1,  ...  a^) 


(26) 


and 


=  (^1 ,  ,  ...  ^>p) 


*  £  J 


p'  '*0’  *1  **’  “p-1 

and  the  p-dimensional  column  vector  of  data  as 


(27) 


/ 


In  vector  matrix  form  equation  (33 )  and  (3U  )  became 

4)z '  ^)8  PL +  4"  =  <^pxV <£« 

If  V©  transpose  to  8.  column  vector 

^  '*£  a<>  +  v(>  -*£  ^ 


(37) 


(38) 


Note  that  the  vector  equation  (38)  looks  like  eauation  (  7)  except 
that  a  is  replaced  by  k  the  sample-size  which  can  become  quite  large  P 
whereas  m  is  equal  to  and  generally  less  than  p  (since  we  can  not  instm 
ot  interest).  We  .ay  also  cons^r  the„“tr£  H  « 

on  Se  s?ae°£  i.  *  S“b'SpaCe  “hereas  F  is  a  “P  »  depending 

7  V *  Vector  Polynomials 

Approximating  components  of  a  vector  with  time  polynomials  for 
example  misstle  position  vector,  velocity  vector  etc.”  yMit  n  variables 


zi(k)  =  8n  *  B2i  \*  ■■■  *  vlk 

2n(k)  *  6ln  *  S2n  xk  *  s3n  4  *  -  *  », 
or  as  inr.er  products 

Z]L{k)  «  <&&)>  +  v.  =  ^  +  e 

1  k  lk  1  H 


(39) 


nk 


(>40) 


*n(k)  =  <^)8^>  +  vn(k) 


'nk 
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Thd  kth  observation  of  the  n-dimensional  vector  is 


Z  *  B  F  +  V=^y  +  E  (45) 

nxk  nxp  jack  nxk 

The  next  section  develops  the  concepts  of  variance  matrices  around 
Case  I,  (the  most  simple  case  ve  can  discuss)  and  applies  the  variance  to 
weighted  least  squares. 

The  two  age-old  techniques  of  unweighted  and  weighted  least  squares 
are  developed  in  a  vector  space  setting. 


10 


il 


12 


13 


Sumraarizing^the  unweighted  estimate  by  equation  ( 11)  is 


a(k) 


■<L*  f 


(21) 


and  the  square  of  the  error  in  the  estimate  of  the  parameter  is  by  equation 
(IT) 


Note  that  we  can  consider  the  arithmetic  mean  (unweighted)  case  as 
an  equi -weight  case  where  each  data- point  is  weighted  by  1/k  or  as  a 
sequence^or  vectorjof  weights 


£<1  =  (1/k,  1/k,  .  .  .  1/k)  (23) 


We  may  now  ask  the  question:  Cam  we  obtain  an  estimate  of  aQ  which  is 
"better"  than  equation  ( 2l)  and  which  has  a  smaller  numerical  value  of 
error-square  of  equation  (22)? 

The  next  section  will  derive  a  sequence  of  weights  such  that  a  weighted 
estimate  of  the  parameter  is  a  linear  combination  of  the  weights  and  the 
data,  that  is 


Z1  W1  +  Z2  V2  + 


+  V. 

k  k 


(24) 


“> 


In  a  vector-space  setting,  we  seek  to  find  a  column  vector  of  weights 


v>  such  that 


a  * 
w 


<£)z  , 


(25) 


and  that  on  the  average .equation  (24)  is  "better  in  some  sense"  than 
equation  ( 21) .  ' 
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WEIGHTED  LEAST  SQUARES- 


The  application  of  weighted  least-squares  and  the  derivation  of  the 
equations  are  developed  in  this  section  for  the  scalar  case.  The  appli¬ 
cation  to  the  observational  data  in  the  context  of  this  report  is  equivalent 
to  a  statistical  calibration  of  the  instrument  (that  is  a  calibration  with 
respect  to  its  noise  characteristics). 

Noise  Considerations  and  Noise  Variance  Matrix. 


Before  we  utilize  the  instrument  for  experiments  or  tests  we  can 
calibrate  the  noise  by  setting  x(l)  (the  input)  equal  to  zero,  hence  the 
only  output  is  v^.  Many  experiments  exist  in  which  we  cannot  control  the 
input,-  for  example  set  the  input  equal  to  zero,  in  order  to  calibrate  the 
instrument.  An  example  is  a  missile  flight  test  for  which  we  want  to  cali¬ 
brate  a  tracking  radar  with  respect  to  its  noise  for  that  region  of  tracking 
space.  In  this  case  one  needs  a  higher  quality  trajectory  measuring  device 
(optical  perhaps) -or  else  a  minimum  of  three  redundant  sensors  such  that 
differencing  makes  the  calibration  results  independent  of  the  trajectory 
(see  reference  (  5)).  The  remainder  of  the  discussions  in  this  report 
assumes  we  can  control  the  inputs  to  zero. 


Many  instrumentation  systems  observing  dynamical  processes  have  an 
upper  bound  on  the  observation  time,  which  in  conjunction  with  samples  per 
second  sets  a  maximum  sample  size,  say  If  we  now  have  time  in  advance 

to  prepare  for  the  test,  to  study  the  outputs  for  samples  up  to  l%ax»  8hy 


»  •  ♦ 


vk»ax>  * 


(26) 


and  repeat  the  sequence  (reset  the  instrument)  J 
k  .  That  i3 


max 


max 


max 


) 

j 


vectors  each  of  dimension 


(27) 


where  j  =  1  .  .  .  j  where  j  may  be  whatever  economical  number  we  can 

afford.  We  certainfy  can  not  calibrate  to  infinity. 

The  k-discrete  points  may  be  taken  as  points  off  of  a  continuous  curve 
Vj(t)  as  shown  in  Figure  (2.) 


Fig  (2.)  Sequences  of  Time-Correlated  Noise 


For  example,  suppose  we  are  planning  to  use  the  instrument  in  a 
number  of  tests  or  experiments  such  that  this  particular  device  is  to 
measure  a  constant  during  each  test.  The  duration  of  each  test  is  such 
that  this  particular  instrument  takes  samples.  The  is  usually 

dictated  by  economy  of  data  processing,  time-sharing  of  a  complete  system 
of  sensor  outputs  via  telemetry,  etc. 


We  can  record  the  Jmax  sequences  (row  vectors)  each  of  dimension 

or  sequentially  feed  the  data  output  into  a  digital  computer  data-processing 
program. 

What  should  we  compute  in  the  program?  Let  us  return  briefly  to  the 
unweighted  case  where  the  unweighted  estimate  by  equation  (11)  is 


'aOO  =  <£)?.  ±.(P 


and  the  error  square  term  is 


1?(k) 


In  an  actual  test  with  an  input  different  from  zero  we  do  not  know  the 
values  of  (v.,  v 2,  ...  vfc) ,  hence  we  can  not  compute H2 (k ) .  For  example^ 

suppose  some  arbitrary  noise  sequence  ^  occurs  during  the  test,  then  the 

parameter  estimation  error  based  on  a  sample  of  size  k  occuring  as  a  result 
of  the  jth  noise  se^Ufince  is 

^(k) <1 1  jj,  (30) 

The  average  error  over  all  Jmax  noise  sequences  is 

OAv(k)  -  1  ,a?(K)  +  *5| (k)  +  ...  +K^(k)  +  ...  (k)J  (31) 

**  J"~  1  J  Jmax  J 

max 


or  in  summation  form 


fix  ax  'N 

4  I  —L 


o^4k)  =  l  l?(k)  | 

“  J  J  K 


The  scalar  <£..»'}.)  is  called  the  variance  of  the  estimate  of  the 
a*  - 

pa-ameter,  or  the  v  ‘•rage  error  in  the  estimate  of  the  parameter  over 

all  experiments  j. 


If  we  use  the  dyad  expression  of  equation  (1?)  in 
obtain  f*  _  1  2 


°a*(k)  =  _A- 
Jmax 


""‘l  2,  ynax 

+  <1  ^>  <£  £>  +  ...  +<i  3^  x  & 

ax  _  1  2  ^max  — 


tion  (3l)  we 
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Factoring  out  the  summation  vector  from  each  end 


wk)  s  ^ +  •••  ^  ••• 


or  in  summation  form 


■ar  «  1  A 
aa  -2\ 


[Jmax  1 

-JL  I  ><  )> 

•W^1  J  J 


The  kxk  matrix  is  the  arithmetic  mean  of  the  Jmax  dyads  and  will  be 
designated  as 


Jmax  , 

Ir s  U  V(H 


k><k)v  1 
1  X  1 


We  shall  also  occassionally  use  the  notation 


J  w  =  Q(^) 

luck  '  kxk 


as  occurs  in  many  of  the  modern  estimation  publications. 

We  shall  also  use  the  notation  or  symbol  for  the  "expectation  operator" 

2,  -  lim  ["  r*  4~|  1  (35> 


J  « L  w  J  JJ«<* 

However,  from  the  practical  world  standpoint  we  assume 
^max  1  ^max  A 

lim  l_  l  _1_  +  Er  (39) 

1  oo  J  i  J  1=1  J 

Jmax  d  x  max 

where  the  error  matrix  E  is  almost  zero  and  J  is  dictated  by  a  large 

r  max 

enough  finite-population  to  be  statistically  representative  of  the  infinite 
population  and  economically  available. 


Hence,  throughout  the  paper  we  assume 


During  an  actual  experiment  comes  from  an  infinite  universe  or 

J 

population;  hut  from  the  real-world  calibration  standpoint  ve  must 
make  computations  based  on  a  countable  finite  and  economical  population. 


Note  that  R  is  not  the  variance  with  respect  to  the  noise  mean;  however 
we  shall  hence- forth  refer  to  it  as  the  instrument  or  merely  noise 
variance  matrix*. 

It  is  the  variance  with  respect  to  a  different  "origin"  not  the  mean 
as  origin. 

The  variance  of  the  noise  with  respect  to  its  mean  is 

Ej  j(> 

where  the  mean  is 


and  can  be  computed  to  give  us  more  information  about  the  noise  character¬ 
istics. 

The  expression  of  equation  (4l)  is  the  most  familiar  expression  for 
a  variance  matrix. 

A  recursive  method  for  digitally  computing  the  matrix  Qof  equation 
(UO )  for  any  number  of  vectors  ^  is  given  in  appendix 

J 


The  expected  error  in  the  estimate  (one-dimenaional  ellipsoid  of 
uncertainty)  of  the  parameter  by  equation  (22)  and  equation  (36)  for 
unweighted  estimation  is 

<Wk>  *  i  <>1  tr  1(>  (U3> 

Derivation  of  the  WelghtB. 

Consider  the  data-vector  of  equation  (5  )  which 
outcome  of  an  experiment  "confused"  by  an  arbitrary  loise 
then 

-  on<  +  i  =^.(k)<l  +  <£  (^) 

?  j  J  J 

Note  that  the  parameter  aQ  does  not  change  with  j  (that  is  the 
exciting  noise  sequence  ^ )  but  all  variables  subscripted  with  j  do. 

J 

We  may  also  take  the  state  of  mind  that  equation  (M )  is  the 
result  of  repeating  the  experiment  J  time  and  ^ is  the  data  sequence 

occuring  as  a  result  of  and  ^  . 


occurs  as  the 
sequence  ^)v, 


We  now  seek  a  sequence  of  k  scalar  weights  designated  as  a  column 
vector  ( independent  of  j ) 


(45) 


such  that  the  inner-product  of  T^with  equation  (41*)  is 

v>.Xi>-  a  v> 

<3  3  J 


where  the  conditions  hold 


1 

^  =  0 
J 


(46) 
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(47) 

(48) 


Note  that  ve  want  a  single  vector  w^to  be  used  for  all  possible 
noise  sequences  ' 

Using  the  constraints  of  equation  (47),  (48)  in  equation  (46) 


-  a  (k) 


(49) 


J 


or  the  weighted  estimate  of  the  single  parameter  based  on  k  samples  is  a 
linear-conbination  of  the  data 


kj  k 


(50) 


holds 


The  error  in  the  estimate  of  the  parameter  is 

a  -  a  (k)  =  a  (k)  ■  ^  •  (51) 

v  Jw  Jw  j 

Since  the  inner-product  of  two  vectors  is  a  scalar  and  commutativity 


a  (k)  « 
Jw 


(52) 


The  square  of  the  error  in  the  weighted  estimate  of  the  parameter  by 
equation  (51)  and  equation  (52)  is 


( 


aj  (k))2=^^>^w^ 


(53) 


or 


(aTj  (k))2 

w  j 


(54) 


J  is 


The  average  value  of  the  error-squared  for  all  possible  noise  sequences 


[a2  (k)  +  a^  (k)  +  .  .  .  +  a2  (k)]  _1 _ 

w  w  Jw  J 

''max 


J 


max 


I  [i  te)J  -i.-OiJJk) 


J*1 


J 


(55) 


max 


which  is  the  weighted  variance  of  the  estimate  of  the  parameter. 
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As  before  the  "expected"  error  square  is 

Jmev 


max 


E  {a^  (k)>  “  £  a^  (k)  . 

J  Jw  j=i  Jv  Jraax 


(56) 


If  we  now  use  the  dyadic-expression  of  equation  $2  )  in  equation  (  55) 
we  obtain 


’aay  m4[p4*  •  •  •  *  ]v> 

Jmax 


(57) 


max 


or  by  equation  (  36) 


’aa  <"  jL  “> 

w  T. w 


(58) 


Equation  (  53)  is  quadratic  in  the  unknown  vector^.  We  now  seek 
a  vector  ^  which  minimizes  the  variance  of  the  estimate  of  the  parameter 
over  all  experiments  (or  noise  sequences  j)  and  also  satisfies  the  cons¬ 
traint  of  equation  (  *+7) . 


The  solution^ by  appendix  C,  equation  (  15  )  is 


(59) 


Utilizing  in  equation(50) 
1 


sa 


”  4|v> 

and  the  weighted  estimate  by  equation  (  59)  in  equation  (  U9)  is 

Sjk-.j 

<&  t; 


w  max 


(6o) 


(6i) 
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SECTION  III 


ESTIMATION  OF  A  CONSTANT  VECTOR  PLUS  NOISE 


This  section  develops  the  unweighted  and  weighted  estimation 
equations  fox*  a  constant  vector  plus  noise.  Utilizing  the  concepts  and 
notation  for  the  scalar  case  except  now  we  assume  that  there  are  m  measure¬ 
ment  variables  (z^,  Z2,  .  .  .  zm),  and  an  experiment  or  test  for  which  we 

take  k  observations.  During  the  test  there  will  be  some  noise  vector 
max 

sequence  V(j) 


V(J) 


mxk 


max 


out  of  a  possible  sequences 


V(l),  . 
mxk 


max 


v(j), 

mxk 

max 


mx^ax^l 


nax 


(1) 


(2) 


where  Jmax  is  infinite,  designates  a  "row  vector  or  matrix" 


max 

of  mxk  matrices. 


We  designate  the  kth  observation  and  its  relation  to  the  noise  as 


(3) 


where  the  unknown  constant  vector  isO(&.  One  may  interpret  tne  constant 
vector  of  equation  (3)  and  equation  (£-9  hence 


(M 


If  we  form  a  data-matrix  by  a  row  of  column  vectors 


2U 


(5) 


(6) 


and  factor  out  the 


> 


Z  = 
mxk 


=  a^><^) 


)1  +  V  . 
mxk 


Consider  au>  an  arbitrary  m  dimensional  vector  and  the  error  or 

residual  vector  &  such 

J 

z(k)J^  *  ajn£>  +  e(k)jn^  =^>+  v(k)J^. 

J  J  J  J 

The  data-matrix  equations  for  all  k  observations  become 

Z(j)  =  a(m><()l  +  V(J)  ='$><( +  E(j) 
mxk  /  N  mxk  'j  X 


(7) 


(8) 


If  we  subtract  the  terms 


[>->K  =  E(j)  -  V(j), 


(9) 


Unweighted  Least  Squares  Estimate 

The  arithmetic  average  of  the  vectors  using  none  of  the  noise 
characteristics  yields 


k 


max 


(10) 


which  is  the  unweighted  least-squares  estimate. 
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In  vector-matrix  form  we  obtain  equation  (10)  by  multiplying 
equation  (9)  by  the  column  vector 


—  J  k 

<§> 


with  the  constraint  of 


*  ■  ■  ■ 


The  error  in  the  unweighted  estimate  resulting  from  the  jth  noise 
sequence  by  equation  (ll)  is 

J  j  mxk  max 
Transposing  (13) 


V/  k 

A  x  kxm  max 


The  dyadic  product  of  (lU)  and  (13)  is  the  mxm  matrix 


=  Vj  1(^ 


l>4  L  k 


The  variance  matrix  of  the  unweighted  estimate  of  the  parameters 
is  the  average  over  all  noise  sequences  j  and  is  the  symmetric  matrix 

r  \  ^  <!✓  l 


■  %  K L 


The  above  mxm  matrix  represents  the  uncertainty  ellipsoid  in  m-space. 
The  trace  of  the  dyad  of  equation  (15)  is  the  inner-product  term 
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and  the  trace  of  equation  ( 16)  is 


where  Q  is  the  average  of  the  matrix  products  ■ 

^raax 

Qvv=  2^V(j)TV(j)  \  (20) 

luck  j=l  kxm  mxk  Jmax 


Weighted  Least  Squares  Estimate 

We  now  seek  an  estimate  with  a  smaller  ellipsoid  of  uncertainty. 
Consider  equation  ( 6  ) 


zj  =><♦  v><*Ej 

mxk  Q  J 

We  need  a  k  dimensional  column  vector  w  such  that 
max  / 


satisfying  the  conditions 
* 1 

KjW>  =  0(j^> 


(21) 


(22) 


(23) 

(21*) 
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tben 


Using  the  constraint  equations  (23 )  and  (2U  )  equation  (22 )  becomes 


Z,w(lc>  =^>+  V^\£>  =  a(n£>  (26) 

ndk  Jv 


Note  that  the  weighted  estimate  is  a  linear  combination  of  the 
observation  vector 


v  + 

2 


^max 


The  error  in  the  estimate  by  equation  (z6  )  is 


>>  -  p  =  p  =  -  V^> 

'  "wj  <jw  * 


(27) 


(28) 


and  transposing  equation  (28 ) 


(29) 


The  mxm  . random  matrix  dyadic  product  is 


(30) 


The  weighted  variance  of  the  estimate  is  the  symmetric  mxm  matrix 


(31) 
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The  trace  of  Of  )  is 


<£  =  <w*/;  v  v> 

.),>£  \  J  J/ 

and  the  trace  of  equation  (31  )  is 


/  imax  T  i  \ 

=  x  -^-vj  vj  j  y 

x  J=1  J  J  Jmax 

tri--  =  <£  0  w> 
aa  \  w/ 

V  kxk 

where  is  given  by  equation  (  2  0) 


The  trace  of  the  ellipsoid  of  equation  (  Jl*)  and  the  hyperplane 
constraint  of  equation  (  03)  are  exactly  the  same  as  the  minimization 
problem  of  equation  (n-58)and  by  equation(|-f>c£  the  weight  vector  is 

i1  lib 

•'>  ■  m 


and  the  weighted  estimate  of  the  parameter  vector  is 

,  -1 


a(g>  =  2  w0£> 

*  “  «> 


with  an  ellipsoid  of  uncertainty  by  equation  (36  )  in  (30  ) 
E  ^><1  =  E  [VJ  ^rv>>^Vv  V]  ] 

<(C)  >2 
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SECTION  IV. 


POLYNOMIAL  PARAMETER  ESTIMATION 


The  classical  approximation  of  a  function  by  a  pth  degree  polynomial 
and  the  weighted  and  unweighted  least  squares  estimates  of  the  parameter 
is  developed. 

Consider  the  polynomial 


\  *  “0  +  al  *k  +  °2  \  *  *  *  ‘  +  «p-l  .  "  ^ 


*  *0  +  *1  *k  +  a2  \ 
Separating  the  parameters 


5-1  + 

'p-l  *k  '  ~k 


2  .  p-l  , 

+  .  .  .  a  ,  xf  +  e 


or 


where 


\  *  (V  V  V  ■  •  •  Vi*  f1  V 


X 

i  1 


l*r 


-1 


(a0»  al5  • 


Vi) 


r 


J  k 

'A 


,  ’p-l  . 


+  e. 


\  *<]>  *  \  ■  <j> 


+  e. 


f(> 


(l  \ 


Vx-V 


(1) 

(2) 

(3) 


(U) 


(5) 


(6) 
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with  the  one-sided  inverse  property 


F'FT  =  i  (16) 

pxp 

then 

<^FT  (FfV1  =  <a  (17) 

■  <«  +  <^FT  (FFT)_1  (l8) 

and 

*  <£)0.  (19) 

Npte  that  ^  and  ^  correspond  to  those  values  of  ^  and  ^  such 
that  ^  is  a  minimum/  The  geometry  and  derivations  are  deriVed  in 
reference  (  4  )  via  partial  derivatives  and  via  orthogonal  projections. 

Differencing  equation  (17)  and  (l8) 


<n  -  <a  =  <a  =  -  <£  FT(FFT)"1 
^  J 


(20) 


where  the  J  as  before  refers  to  the  Jth  noise  sequence.  Transposing 


(21) 


The  dyadic  product  of  (20)  and  (21)  is 

^><|f=  (FFTrV^>  ^f^FF1)-1 

and  expected  value  over  all  noise  sequences  is 


(22) 


(23) 
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(*'F~)  J'K  Ej 


i 


ft(fkv)  1 


(F>0 


where  the  noise  characteristics  are 


XXX 


Using  equation  ('•>)  in  equation  )  we  see  that  the  ellipsoid  of* 
uncertainty  in  p-spnee  (cue  pxo  oym-.iet.ric  matrix  uescri:. : n:;  the  vari  nunc 
of  the  estimate  of  the  parameters )  is 


I  J  1  » f  >  T 

(  T '  )  '  F  Q  FJ-(FFA)“t 
w 

pxp  pxk  kxk 


Wei  gh  ted  heas_t  Siiuares^ 

This  section  derives  the  classical  weighted  least-squares  equations 
in  a  vector-space  setting. 

he  seek  a  iixp  matrix  W  such  that  post-multiplying  equation  (10)  and 


If  tin;  conditions  of 
,.)  -  t 


t.iieu 


( 30) 


TEXT  NOT 


PRODUCIBLE 


u ; 


If  we  factor  W  into  its  row-space,  that  is  k  vectors  of  dimension  p 


W 


(32) 


<£)» 

<ljw 

u  x  J  / 
then  we  can  consider  the  p-dimensional  row  vector  of  parameters  <aw  as  a 

linear  combination  of  the  scalar  data  and  the  weighting  vectors  X 


=  (z.^,  Z2*  *  *  *  Zk^ 


<^)w 


I  J 


(33) 


or 


<p)&w  31  z1  <£)w  +  z2<^)w  +  .  .  .  +  z^w 


(3U) 


The  error  vector  in  the  estimate  of  the  parameters  by  equation  ( 31)  is 
=<(-  =  -  ^.W  <35) 


3 


where  as  before  the  j  denotes  the  estimate  resulting  from  the  Jth  noise 
sequence 


J 


)v  *  (v,  ,  V„,  .  .  .  V,  )  . 
ax  1’  ,2’  kmax  j 


(36) 


and  we  want  a  W  to  be  used  for  any  of  the  J's,  that  is  W  is  not  a  function 
of  J. 


The  transpose  of  ( 35)  is 

^>=-WTv>  (37) 

vj  J 


3U 


The  outer-product  and  inner  products  respectively  are 


By  equation  (Appendix  B-79) 


pxk 

Form  the  difference  matrix 


Hie  sums  over  all  j  divided  by  j  is 
max  max 


J 


max  j  =1 


^mox 


^aa 

pxp 


FW 

pxp 


The  trace  of  equation  ( 43)  is 

tr  f  =  tr^f-...  -  tr(FW) 
*  aa 


(30) 


(39) 


('tO) 


(4l) 


(42) 


(43) 


(44) 
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The  trace  of  equation  ( l+l) 


tr 


tr  FW 


1*5 


The  gradient  of  the  scalar  differences  of  equation  (1*5)  is 

3  W  3W  3W 

pxk 

and  by  equation  (1*0)  and  equation  (B-93) 


(1*6) 


W<‘r  V  -  -  r 

J 


if  // 

The  expected  value  over  all  J  is 


(1*7) 


(U8) 


Minimizing  the  scalar  difference  expression  of  equation  (1*8)  requires 
the  gradient  term  of  equation  (1*8)  to  be  equated  to  the  [0]  matrix. 


WT  Q2  -  F  =  [0] 

(49) 

*'r  tap 

T  -1 

W  2  . 

(50) 

constraint  of  equation  (29)  is 

FW  *  I 

(51) 

pxp 

36 


and  transposing 


T  T 

W  F  =  I 


m 

hence  multiplying  equation  (50  )  by  F 


T  T  -IT 

W  F  2  =  12  =  F  0  J  F 

pxp  pxp 


Transposing  equation  (50 


-1  T 
WI»QyV  F 

KXp  vv 


and  using  (53  ) 


W  .( F  (T1.  FT) 


(kxp)  ,  vv 
(pxp) 


-1  T 
0.  F1 
vv 

kxk  kxp 


and  solving  for  W 


or 


-1  T  -1  -1 

w  =  qJ;  f1  (f  oi  fa)  1 

kxp  vv  w 


WT  =  (F  Q  1  FT)_1  F  0"1  , 

pxk  vv  FV 


Utilizing  the  weight-matrix  (50)  in  equation  (33) 


’  <C“  ■  fT<F  C  FTrl 

a  j  j 

which  is  the  weighted  estimate  of  the  parameters  <a 
The  error  in  the  estimate  by  equation  68  )  is 


(52) 

(53) 

(5U) 

(55) 

(56) 

(57) 

(58) 

(5°) 
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SECTION  V. 


MULTI-VARIABLE  POLYNOMIAL 


Many  missile  range  data  processing  tasks  pose  the  problem  of 
simultaneously  fitting  time  polynomials  to  a  number  of  variables.  For 
example,  a  three  dimensional  trajectory  with  three  coordinates  of 
position  x(t),  y(t),  'z(t)  and  three  coordinates  of  velocity  *(t),  y(t) 
and  z(t)  for  which  we  wish  to  approximate  can  be  expressed  as 


f  1  \ 

z  (t) 

'  x(t)N 

Z2(t) 

y(t) 

z3(t) 

z(t) 

z\t) 

x(t) 

Z5(t) 

y(t) 

z^(t) 

V  > 

2(t) 

V  J 

The  following  derivations  assume  q  coordinates  instead  of  six. 
Consider  the  approximating  parameters  for  each  coordinate  as  given  by 
equation(lV-5)  fora  single  variable  except  now  the  superscript  1  to 
q  designates  the  coordinate,  that  is 
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Packaging  the  above  q  coordinate  into  a  q  dimensional  column  vector 


1+0 


hence 


Equation  (  7 )  is  the  kth  observation  of  all  q  variables. 
If  ve  form  the  data  matrix  for  •:  observations 


ractoriug  out  the  parameter  matrices 

I'  =  A  ”  +  V  =  A  ?  +  E  (11) 

qxp  px’r.  qxk  (qxpMpxa)  qxk 


The  unweighted  estimate  does  not  require  any  characteristics  cf 
the-  noise  V«  where  ve  assume  that  there  are  j  different  sequences 
r'XK 

of  noise  matrices. 


If  equation  ( 11)  is  post-aultiplied  by  the  transpose  of  F 


m  T  T 

ZjF  =AFF  +  VjF 


=  AjFF  +Ejf-  V. 

n  m 

and  the  pxp  matrix  FFr  is  full  rank,  then  multipling  by  (FF  )  yields 


„T  /mT,-l 


Z«F  (FF  )  =  A  ♦  V,F  (FF  ) 


=  Aj  +  ejft(fft)'1 

The  unweighted  least  squares  condition  is 

Ej  FT  =  (0] 

( qxk ) { kxp )  qxp 

which  is  shown  in  reference  (  *»)  using  partial  derivatives  and  also 
shown  algebraically  via  orthogonal  project! rns. 

Using  ( 17)  in  ( l6) 

Aj  =  Z  FT(FFT)-1 
qxp  qxx  kxp  pxp 

The  error  in  the  estimate  by  equation  { 15)  and  ( 1 6)  is 
A  -  A,  =  Aj  =  -VjF^FFV1 


The  transpose  of  ( 19)  is 

AjT  =  -(FFT)_1F  VjT  I 

<ixp 

The  two  matrix  products  (major  and  minor),  (larger  and  smaller), 
(outer  and  inner)  available  are 

^  T  T  T  — P  rp 

AJ  Aj1  =  Vj  FA(FFA)  'FV,  I 

( qxp )  ( pxq )  J  J 


ft* 


'r-s  *  T-l  T  T  ^*-1 

-i  sj  -  «r?  >  *  ?vj  v (rF  > 

(pxr)oxp 

The  traces  of  the  two  are  the  sane,  that  is 

tr(A.Aj)  =  tr(AV  A  ) . 

J  J  J  j 

If  we  Dart  t ion  A  into  n  dimensional  row  vectors 


o  ,  i 

L^J- 


and  transposin'; 


sj  '  i# - 


(  are 


The  two  matrix  products  of  equation  i  ,:±)  -and  1  22)  using  ( 2l)  end 


<£fa  "j  ;  a(^>  ,  .  .  .  ] 


L  .^  J  j 

=  f  <^)a  *(p>  ,  .  .  .  <n)2  a{^ 


<fOa  fi(p>  ,  .  .  .  <p)d  a (d> 


which  is  an  "outer-product"  of  "inner-products". 


The  product  of  equation  (22)  is 


or  an  "inner  product"  of  "outer  products" 


(28) 


(29) 


The  ge,  Metrical  significance  of  the  many  previous  forms  is  obtained 
from  the  representation  of  the  q  parameter  error  vectors  each  of  dimension 
p  as  a  column  of  column  vectors 


(30) 


and  the  transpose 


(31) 


The  dyadic  product  yields 
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and  over  all  j 


(33) 


"TTaa 

pxp  qq 

we  obtain  a  matrix  of  variance  matrices. 


The  sums  of  the  main  diagonal  matrices  of  equation  (33  )  is  the 
expected  value  of  equation  ('?9  ) 


*  j 


a:  a. 
j  j 

pxp 


pxp 


+  . 


13>0 


Weighted  Least  Squares 

This  section  derives  the  sequence  of  weights.  Consider  equation  (  7) 

2  =  A  F  +  V  =  AF  +  S  (35) 

qxp  (qxpMpxk)  qxk  qxk 

We  seek  a  kxp  matrix  W  such  that  post-multiplying  (35 ) 


Zi  W  =  A  +  V.W  =  A. 
pxK  kxp  J  J 


(36) 


where 


and 


F  w 
( pxk ) ( kxp ) 


=  I 
pxp 


(37) 


*»5 


E  W  =  [0] 
( qxk ) ( kxp )  qxp 


(38) 


then 


ft,  =  Z.  W  =  A  +  V,W 
jw  J  J 

flXp  qxk 


(39) 


Factoring  Z  into  its  column  space  and  W  into  its  row  space 


Equation  ( 4l)  states  that  ve  need  a-sequence  of  p-dimension  weighting 
row  vectors  so  that  the  weight  estimate  of  the  qxp  matri^of  parameters 
Ajw  is  a  linear-dyadic  combination  of  the  data  vectors  z(j£> 

"  k 

When  q  is  equal  to  one  ue  see  that  equation  ( 4  )  becomes  equation 
(IV-310. 

The  error  in  the  weighted  estimate  of  the  parameters  by  equation 
( 39)  is 


A  -  V  *  V  -  -  V" 

qxp 


(42) 


The  transpose  of  equation  ( 1+2)  is 

=  -wTv* 

jv  J 

pxq 

The  two  matrix  products  are 


at  - 


(qjJp)($xq) 


V  wwTv 

J  J 


(43) 


(41+) 
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and 


'‘7 


The  expected  value  over  all  J  is 


APPENDIX  A  -  MATRIX  TRACE  PROPERTIES 


ISie  trace  of  a  matrix,  the  trace  of  the  product  of  two  matrices, 
and  the  trace  of  a  matrix-sum  are -useful  notions  to  aid  the  development 
of  the  topics  of  Appendix  S. 

Consider  a  matrix  A  of  p  rows  and  m  columns  where  m  <  p  and  a . 
pxm 

matrix  B  ,  then  the  product 
mxp 


<L  -  A  B 
px$  pxm  mxp 

is  a  pxp  matrix. 


(1) 


The  matrices  A  and  £  can  be  partitioned  into  their  row  and  column 
spaces  as  shpwn 


A  * 


,  •  •  •  a 


(2) 


(3) 


The  product  Q  can  be  written  as  a  matrix  of  inner-products 


(U) 


(5) 
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or  as  a  sum  of  dyads  (outer  products) 


Ql=A£= 


a£jj>  ....  a(^ 


Equation  (J)  expresses  as  a  sum  of  m  rank-one  matrices. 

If  we  commute  the  product  we  obtain  a  square  mxm  matrix 


Q,  =  2  A 

2  mxp  pxm 
mxm 


and  as  before  Q„  can  be  written  as  a  matrix  of  inner-products 


or  as  a  sum  of  dyadic  products 


(6) 


(7) 


(8) 


(9) 


(10) 


(11) 
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Clearly  matrix  multiplication  is  not  commutative,  that  is 


AB  4  BA 
pxp  1  mxm 


(12) 


in  fact  the  matrices  are  not  even  of  the  same  size. 

However  the  trace  of  both  products  are  equal,  that  is 
tr(AB 


(AB)  -  tr(BA) 
potp  mxm 


The  following  will  clarify  the  above  relation. 


dimension 
and  p  columns 


If  we  have  a  column  vector  and  a  row  vector  ^ly  o 

sion  p  then  the  dyadic  product  is  the  square,  rank^bne, 

‘>4  -  ty  '  '  '  a) 

Vx/l  •  .  .  x-ypy 


(13) 


of  the  same 

matrix  D  of  p  rows 


D 

pxp 


If  we  commute  the  product  of  Equation  (lU)  we  obtain 

2  , 


lxl 

a  scalar. 


djL  “  •  y^1  +  y2x2 


+  y  x^ 
P 


(HO 


(15) 


.1 


When  the  elervents  y^  and  x  are  real  field  elements  the  products 
commute,  hence 


V 


A, 


(16) 


and  Equation.  (15)  [the  inner  product]  can  be  written  as  the  sum  of  the  main 
diagonal  terms  of  3£><£,  which  the  conventional  definition  of  the  trace  (tr)  of 
a  matrix,  hence 


tr 


)*£]-  <o>- 


(IT) 


The  dyadic  product  is  not  as  mysterious  as  many  novices  might 
imagine;  in  fact,  if  we  write  Equation  (1*0  as 


D 

pxp 


we  see  that  the  matr 


)x$  ■  ?>k-  y2  •  •  ■  ^ 
)>y2  •  •  >0 
rlx  D  when  partitioned  int 


(18) 


partitioned  into  its  column  space  is  a  row 


of  P  Parallel  column  vectors  -  all  p  of  the  vectors  lie  on  a  line,  hence 
is  said  to  have  rank  one  -  that  is,  there  is  only  one  linearly 
independent  vector  in  the  row  "package"  of  column  vectors. 
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By  equation  ( b— T )  and  (b-5)  we  can  state 

<|  *4 

He  arrive  at  the  result  of  equation  (b-9)  directly  from  (l) 

•  *  4 


hence 


4^' 


Also  one  can  consider  the  gradient  as  an  operator 


l<^  "  4^  =  <|>4_ 


(b-9) 


(b-10) 


(b-ll) 


(b-12) 


The  dyadic-type  operator 


( 


3x, 


v  *py 


ax1  ax'1- 

3X1  3V  ’  ’ 


dxf 

3x, 


3x 
*  3x 


3xP! 

3x  | 
Pi 


3x 


) 


(b-13) 


(b-lh ) 


when  the  coordinates  are  independent  of  each  other,  then 


(b-15) 
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Hence 


<*4  =4-< 


(b-l6) 


(b-17) 

(13-18) 


Case  2.  q  »  4 

When  q  is  quadratic  ve  can  write  q  as  the  trace  of  the  dyad 


q  3  ^4 

(b-19) 

for 

tr  Q  =  tr  (^^c)  =  4  ^>  =  q. 

(b-20) 

The 

differential  of  the  dyad 

dQ  =  + 

(b-2l) 

dq  =  tr  dQ  =  ^  ^  +  2  4  ^ 

(b-22) 

hende 

'S’lK 

II 

ro 

A 

(b-23) 

Case  3. 

q  =  B^> 

(b~2l*) 

for 

this  case  we  have  two  different  matrices 

(b-25) 

and 

Qg  *  j^<cB 

(b-26) 
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1 

t 

which  under  the  trace  operation  map  down  to  the  same  scalar 

t 

{  V 

j  q  =  tr  Q1  =  tr  Q2  =  ^  B 

(b-27) 

j  The  differential  of  =  Q  is 

.  |  dQ  =  dj^^cB 

(b-28) 

■  .  The  trace  of  (b-28)  is 

|  tr  dQ  =  <£  B  dx>  +  ^x  b£> 

(b-2?) 

|  The  differential  of  (b-2b)  is 

\  dq  =  <^c  &£>  +  <^B  cb£>  =  tr  Q 

(b— 30) 

|  dq  a  <^BT  <b£>  +  <^B  d^> 

(b— 31) 

[  dq  =  <£  jfi  +  B^j  d£> 

(b-32) 

we  have  x 

dq 

(b-33) 

and  by  (b-32)  and  (b-33) 

*  4  ®  * B  j 

(b-3b) 

and  for  symmetric  B 

T 

B  =  B 

(b-35) 

then 

s|x  =  2<Z» 

(b-36) 

Case  k.  q  =  <p)a  X  b(n> 

"  ■  1,11 .  X  pxm  / 

(b-37) 

The  scalar  q  is  a  function  of  the  matrix  X  of  p-rows  and  m 

columns. 

The  scalar  q  can  be  written  as  the  trace  of  the  matrix 

• 

Q  -  b(n^^f)a  X 
mxm  '  ^  pxm 

lb-38) 

j  * 
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♦ 

- 

The  differential  of  Q  is 
dQ  dX 

By  equation  (b-37),  differentiating 
dq  -  dX^>  =  tr  dQ. 


(b-39) 


(b-40) 


We  seek  a  gradient  matrix  3^  of  m  rows  and  p  columns  as  one  of  the  factors 

ax 

of  dQ  that  is 


dQ  »  H  dX 
mxm  "Sx  pan 
.nap 


(b-4l) 


such  that 


trdQ  =  dq  *  4^y 

Clearly  by  equation  (b-39)  and  (b-4l)  if 


ISL 

3X 


=  b(^<^)a 


(b-42) 


(b— 43 ) 


then  (b-42)  is  satisfied. 

An  alternate,  more  direct,  approach  is  given  below.  Partition  X  into 
a  row  of  column  vectors  (all  ?  con^ravariank"  vectors),  then 


s<^)aj”x(^>  ,  ... 

,  .  .  .  ^  j^>Jb(n> 

*  j<£  |>,  <£  ^  >  •  •  •  j"b  1 


(b-44) 


m 
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•*>•**-■ 


(b-l*5) 


where  eac.i  q^  is  a  function  of  a  single  column  vector 
The  scalar  differential  of  q  is 


(b-46) 


(b-1+7) 


(b-h8j 


(b-49) 
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or 


fx  =  b(n><^ja  (b-?5) 

mcp 


hence  in  conclusion 


then  If  ■  ><X'- 

mxp 


1  =  <)apL  mlp  “!> 


for  this  case  we  set 


B  a 


as  in  equation  (b-56),  then 


and  we  obtain  the  case  h,  hence 


or 


(b-56) 


(b-57) 


(b-58) 


(b-59) 


(b-6o) 


(b-6l) 


Case  6 


Q 


(b-62) 
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where  dQ  is  as  in  equation  (b-l»l). 


=  q 


Using  (b-70),  (b-7l),  (b-72),  (b-73)  in  (b-69) 


=  <|X + 


(b-69) 

(b-70) 

f 

(b-71) 

(b-72) 

(b-73) 


(b-7l0 


(b-75) 


Packaging  (b-75)  into  the  gradient  matrix  of  equation  (b-5l) 


(b-76) 


Consider  the  pxp  matrix  L  which  has  factors  as  shown 

L  =  B  -X 
pxp  pxk  kxp 


<b-8l) 


where  X  is  a  variable  matrix. 


If  we  factor  B  into  its  column  space  and  X  into  its  row  space 


(b-82) 


(la-83) 


The  differential  of  (  )  is  < 

dL  =  B  d  X. 

The  factors  of  dL  can  also  be  expressed  as 


(b-81») 


(b-85) 


(b-86) 


(b-8'i ) 
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The  differential  of  Equation  (  }  is 


d(tr  L)  =  d£  =  d£^  +  .  .  .  +  dl^ 


+ 


•  « 


(b-88) 


(b-89) 


where 


and 


•  .  b(o>]  =  B 
X  pxk 
k 


In  summary, 


(b-90) 


(b-91) 


If 

L  =  B  X  , 
pxp  (pxknkxp; 


then 


3 (trL)  =  B 
pxk 

nxk 


(b-92) 


(b-93) 
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APPENDIX  C 


MINIMIZATION 


Consider  the  linear  surface 

i  ■<£  i> 

and  the  quadratic  surface 


and  the  difference 


q  -  l  =  <J>. 


(x) 


(2) 

(3) 


If  £  is  a  constant,  l  =*  £o,  then  we  seek  a  vector  x  that  lies  on 
the  linear  surface  and  on  the  quadratic  surface  such  that  difference  in 
the  linear  surface  and  the  quadratic  surface  is  a  minimum. 

Di f f e  rent i at ing 


and 


or 


If  we  equate  the  gradient  vector  to  zero 


(9) 
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By  equation  (  )  and  equation  (  ) 

2^Q  * 

and  solving  for^ 

Multiplying  equation  (ll)  by  ^  and  using  equation  ( 

X  b^>  =  /b  a'1  b>  «  to 


.-1 


or 


to 


2  4>q"H> 

Using  (13)  in  (ll) 


>x  =  to  ^  Q  1 


If 


to  a  1 


then 


-1 


(10) 

(11) 

(12) 

(13) 

(1U) 

(15) 
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